A Dataset Generator for Whole Genome Shotgun Sequencing
نویسنده
چکیده
Simulated data sets have been found to be useful in developing software systems because (1) they allow one to study the effect of a particular phenomenon in isolation, and (2) one has complete information about the true solution against which to measure the results of the software. In developing a software suite for assembling a whole human genome shotgun data set, we have developed a simulator, celsim, that permits one to describe and stochastically generate a target DNA sequence with a variety of repeat structures, to further generate polymorphic variants if desired, and to generate a shotgun data set that might be sampled from the target sequence(s). We have found the tool invaluable and quite powerful, yet the design is extremely simple, employing a special type of stochastic grammar.
منابع مشابه
A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer
BACKGROUND The MinION™ is a new, portable single-molecule sequencer developed by Oxford Nanopore Technologies. It measures four inches in length and is powered from the USB 3.0 port of a laptop computer. The MinION™ measures the change in current resulting from DNA strands interacting with a charged protein nanopore. These measurements can then be used to deduce the underlying nucleotide sequen...
متن کاملEstimation of the Redundancy in Human Genome Shotgun Sequencing by a Monte-Carlo Simulation
In order to quantitatively comprehend the essence of whole genome shotgun sequencing, a Monte-Carlo simulation was carried out. It was estimated that even a vast genome such as human genome can be sequenced at a moderate redundancy ( 7) with a satisfactory accuracy (10 error rate), resulting in a high sequencing speed and much lower cost. Switching from a random process (i.e., shotgun) to a dir...
متن کاملWhole-genome shotgun sequencing of a colonizing multilocus sequence type 17 Streptococcus agalactiae strain.
This report highlights the whole-genome shotgun draft sequence for a Streptococcus agalactiae strain representing multilocus sequence type (ST) 17, isolated from a colonized woman at 8 weeks postpartum. This sequence represents an important addition to the published genomes and will promote comparative genomic studies of S. agalactiae recovered from diverse sources.
متن کاملOptimized multiplex PCR: efficiently closing a whole-genome shotgun sequencing project.
A new method has been developed for rapidly closing a large number of gaps in a whole-genome shotgun sequencing project. The method employs multiplex PCR and a novel pooling strategy to minimize the number of laboratory procedures required to sequence the unknown DNA that falls in between contiguous sequences. Multiplex sequencing, a novel procedure in which multiple PCR primers are used in a s...
متن کاملAnalyzing WGBS with the bsseq package
This document discusses the ins and outs of an analysis of a whole-genome shotgun bisulfite sequencing (WGBS) dataset, using the BSmooth algorithm, which was first used in [1] and more formally presented and evaluated in [2]. The intention with the document is to focus on analysis-related tasks and questions. Basic usage of the bsseq package is covered in “The bsseq user’s guide”. It may be use...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Proceedings. International Conference on Intelligent Systems for Molecular Biology
دوره شماره
صفحات -
تاریخ انتشار 1999